NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Beyond Raw Bytes: Towards Large Malware Language Models

Kurlandski, Luke; Berger, Harel; Pan, Yin; Wright, Matthew (February 2026, NDSS 2026)

Malware poses an increasing threat to critical computing infrastructure, driving demand for more advanced detection and analysis methods. Although raw-binary malware classifiers show promise, they are limited in their capabilities and struggle with the challenges of modeling long sequences. Meanwhile, the rise of large language models (LLMs) in natural language processing showcases the power of massive, self-supervised models trained on heterogeneous datasets, offering flexible representations for numerous downstream tasks. The success behind these models is rooted in the size and quality of their training data, the expressiveness and scalability of their neural architecture, and their ability to learn from unlabeled data in a self-supervised manner. In this work, we take the first steps toward developing large malware language models (LMLMs), the malware analog to LLMs. We tackle the core aspects of this objective, namely, questions about data, models, pretraining, and finetuning. By pretraining a malware classification model with language modeling objectives, we were able to improve downstream performance on diverse practical malware classification tasks on average by 1.1% and up to 28.6%, indicating that these models could serve to succeed raw-binary malware classifiers.
more » « less
Free, publicly-accessible full text available February 24, 2027
Beyond Raw Bytes: Towards Large Malware Language Models

https://doi.org/10.5281/zenodo.17047101

Kurlandski, Luke; Wright, Matthew; Pan, Yin; Berger, Harel (September 2025, Zenodo)

Dataset and Code for the NDSS 2026 paper on Large Malware Language Models
more » « less
Do Fear the REAPIR: Adversarial Malware from API Replacement

https://doi.org/10.1109/EuroSPW67616.2025.00013

Kurlandski, Luke; Mosli, Rayan; Pan, Yin; Thianphan, Sirapat; Wright, Matthew (June 2025, IEEE)

Free, publicly-accessible full text available June 30, 2026
They Might NOT Be Giants: Crafting Black-Box Adversarial Examples Using Particle Swarm Optimization

https://doi.org/10.1007/978-3-030-59013-0_22

Mosli, Rayan; Wright, Matthew; Yuan, Bo; Pan, Yin (September 2020, Lecture notes in computer science)
null (Ed.)
As machine learning is deployed in more settings, including in security-sensitive applications such as malware detection, the risks posed by adversarial examples that fool machine-learning classifiers have become magnified. Black-box attacks are especially dangerous, as they only require the attacker to have the ability to query the target model and observe the labels it returns, without knowing anything else about the model. Current black-box attacks either have low success rates, require a high number of queries, produce adversarial images that are easily distinguishable from their sources, or are not flexible in controlling the outcome of the attack. In this paper, we present AdversarialPSO, (Code available: https://github.com/rhm6501/AdversarialPSOImages) a black-box attack that uses few queries to create adversarial examples with high success rates. AdversarialPSO is based on Particle Swarm Optimization, a gradient-free evolutionary search algorithm, with special adaptations to make it effective for the black-box setting. It is flexible in balancing the number of queries submitted to the target against the quality of the adversarial examples. We evaluated AdversarialPSO on CIFAR-10, MNIST, and Imagenet, achieving success rates of 94.9%, 98.5%, and 96.9%, respectively, while submitting numbers of queries comparable to prior work. Our results show that black-box attacks can be adapted to favor fewer queries or higher quality adversarial images, while still maintaining high success rates.
more » « less
Full Text Available

Search for: All records